In recent years, deep learning (DL) models have demonstrated remarkable achievements on non-trivial tasks such as speech recognition and natural language understanding. One of the significant contributors to its success is the proliferation of end devices that acted as a catalyst to provide data for data-hungry DL models. However, computing DL training and inference is the main challenge. Usually, central cloud servers are used for the computation, but it opens up other significant challenges, such as high latency, increased communication costs, and privacy concerns. To mitigate these drawbacks, considerable efforts have been made to push the processing of DL models to edge servers. Moreover, the confluence point of DL and edge has given rise to edge intelligence (EI). This survey paper focuses primarily on the fifth level of EI, called all in-edge level, where DL training and inference (deployment) are performed solely by edge servers. All in-edge is suitable when the end devices have low computing resources, e.g., Internet-of-Things, and other requirements such as latency and communication cost are important in mission-critical applications, e.g., health care. Firstly, this paper presents all in-edge computing architectures, including centralized, decentralized, and distributed. Secondly, this paper presents enabling technologies, such as model parallelism and split learning, which facilitate DL training and deployment at edge servers. Thirdly, model adaptation techniques based on model compression and conditional computation are described because the standard cloud-based DL deployment cannot be directly applied to all in-edge due to its limited computational resources. Fourthly, this paper discusses eleven key performance metrics to evaluate the performance of DL at all in-edge efficiently. Finally, several open research challenges in the area of all in-edge are presented.
translated by 谷歌翻译
Search and Rescue (SAR) missions in remote environments often employ autonomous multi-robot systems that learn, plan, and execute a combination of local single-robot control actions, group primitives, and global mission-oriented coordination and collaboration. Often, SAR coordination strategies are manually designed by human experts who can remotely control the multi-robot system and enable semi-autonomous operations. However, in remote environments where connectivity is limited and human intervention is often not possible, decentralized collaboration strategies are needed for fully-autonomous operations. Nevertheless, decentralized coordination may be ineffective in adversarial environments due to sensor noise, actuation faults, or manipulation of inter-agent communication data. In this paper, we propose an algorithmic approach based on adversarial multi-agent reinforcement learning (MARL) that allows robots to efficiently coordinate their strategies in the presence of adversarial inter-agent communications. In our setup, the objective of the multi-robot team is to discover targets strategically in an obstacle-strewn geographical area by minimizing the average time needed to find the targets. It is assumed that the robots have no prior knowledge of the target locations, and they can interact with only a subset of neighboring robots at any time. Based on the centralized training with decentralized execution (CTDE) paradigm in MARL, we utilize a hierarchical meta-learning framework to learn dynamic team-coordination modalities and discover emergent team behavior under complex cooperative-competitive scenarios. The effectiveness of our approach is demonstrated on a collection of prototype grid-world environments with different specifications of benign and adversarial agents, target locations, and agent rewards.
translated by 谷歌翻译
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer.github.io
translated by 谷歌翻译
This work addresses cross-view camera pose estimation, i.e., determining the 3-DoF camera pose of a given ground-level image w.r.t. an aerial image of the local area. We propose SliceMatch, which consists of ground and aerial feature extractors, feature aggregators, and a pose predictor. The feature extractors extract dense features from the ground and aerial images. Given a set of candidate camera poses, the feature aggregators construct a single ground descriptor and a set of rotational equivariant pose-dependent aerial descriptors. Notably, our novel aerial feature aggregator has a cross-view attention module for ground-view guided aerial feature selection, and utilizes the geometric projection of the ground camera's viewing frustum on the aerial image to pool features. The efficient construction of aerial descriptors is achieved by using precomputed masks and by re-assembling the aerial descriptors for rotated poses. SliceMatch is trained using contrastive learning and pose estimation is formulated as a similarity comparison between the ground descriptor and the aerial descriptors. SliceMatch outperforms the state-of-the-art by 19% and 62% in median localization error on the VIGOR and KITTI datasets, with 3x FPS of the fastest baseline.
translated by 谷歌翻译
This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites.
translated by 谷歌翻译
Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.
translated by 谷歌翻译
The computational complexity of the self-attention mechanism in Transformer models significantly limits their ability to generalize over long temporal durations. Memory-augmentation, or the explicit storing of past information in external memory for subsequent predictions, has become a constructive avenue for mitigating this limitation. We argue that memory-augmented Transformers can benefit substantially from considering insights from the memory literature in humans. We detail an approach for integrating evidence from the human memory system through the specification of cross-domain linking hypotheses. We then provide an empirical demonstration to evaluate the use of surprisal as a linking hypothesis, and further identify the limitations of this approach to inform future research.
translated by 谷歌翻译
通过将从地面视图摄像头拍摄到从卫星或飞机上拍摄的架空图像的图像,通过将代理定位在搜索区域内,将代理定位在搜索区域内,将代理定位在搜索区域中。尽管地面图像和架空图像之间的观点差异使得跨视图地理定位具有挑战性,但假设地面代理可以使用全景相机,则取得了重大进展。例如,我们先前的工作(WAG)引入了搜索区域离散化,训练损失和粒子过滤器加权的变化,从而实现了城市规模的全景跨视图地理定位。但是,由于其复杂性和成本,全景相机并未在现有机器人平台中广泛使用。非Panoramic跨视图地理定位更适用于机器人技术,但也更具挑战性。本文介绍了受限的FOV广泛地理定位(Rewag),这是一种跨视图地理定位方法,通过创建姿势吸引的嵌入并提供将粒子姿势纳入暹罗网络,将其概括为与标准的非填充地面摄像机一起使用,以供与标准的非卧型地面摄像机一起使用。 Rewag是一种神经网络和粒子滤波器系统,能够在GPS下的环境中全球定位移动代理,仅具有探测仪和90度FOV摄像机,其本地化精度与使用全景相机实现并提高本地化精度相似的定位精度与基线视觉变压器(VIT)方法相比,100倍。一个视频亮点,该视频亮点在https://youtu.be/u_obqrt8qce上展示了几十公里的测试路径上的收敛。
translated by 谷歌翻译
该手稿解决了预测出院后全因住院再入院或死亡的同时问题,并量化放电放置在防止这些不良事件中的影响。为此,我们开发了一个固有的可解释的多级贝叶斯建模框架,该框架灵感来自重新激活的深神经网络的分段线性。在生存模型中,我们明确调整了混淆,以量化局部平均治疗效果以进行放电的干预措施。从2008年和2011年开始,我们对5%的Medicare受益人样本进行了培训,然后在2012年的索赔中测试了该模型。该模型对30天全因素外的再选中(使用官方CMS方法定义)的分类精度进行了评估,该模型对XGBoost,Logistic回归(功能工程后)和对同一数据进行训练的贝叶斯深神经网络的执行方式相似。该模型对30天的分类任务进行了预测的30天分类任务,该任务是使用剩下的未来数据进行测试,该模型的AUROC约为0.76,AUPRC约为0.50(相对于测试数据中的总体阳性速率),AUPRC的AUPRC达到了约0.76,而AUPRC的AUPRC则达到了AUPRC,则获得了AUPRC。证明人们不需要为准确性而牺牲可解释性。此外,该模型的测试AUROC为0.78,分类为90天全因素外再入院或死亡。我们很容易地凝视着我们固有的可解释模型,总结了其主要发现。此外,我们演示了Black-box Perthoc解释器工具的形状如何生成不受拟合模型支持的解释 - 如果以面值为单位,则没有提供足够的上下文来使模型可操作。
translated by 谷歌翻译
为了推荐有关季节性零售活动的相关商品,我们依靠市场清单中的项目检索。通过反馈来扩展查询范围,我们使用单词嵌入相似性讨论关键字扩展候选选择,以及增强的TF-IDF公式,用于搜索排名中的扩展单词。
translated by 谷歌翻译